This is the R vignette of a Single-Cell RNA-Seq Transcriptomics project. In this vignette we will start from a count table, with counts from project SRS3044261. Data can be retrieved from https://panglaodb.se/view_data.php?sra=SRA653146&srs=SRS3044261.
The analysis is performed on a dataset of lung cells (protocol: 10x chromium). Single cells were sequenced on the Illumina NovaSeq 6000.
library(Seurat)
library(dplyr)
#devtools::install_github("thomasp85/patchwork")
library(patchwork)
library(ggplot2)
Let’s load the count table.
exp.mat <- read.table(file = "SRA653146_SRS3044261.mat", header = TRUE, as.is = TRUE, row.names = 1)
The row names of the table are the “annoying” x_ENSMUSG IDs with the number at the end. Let’s clean them.
ensembl <- gsub("\\_E.*", "", rownames(exp.mat))
ensembl <- make.names(ensembl, unique=TRUE)
rownames(exp.mat) <- ensembl
We create our Seurat object and complete the initialization steps.
lung <- CreateSeuratObject(counts = exp.mat, project = "lungmm", min.cells = 3, min.features = 200)
Feature names cannot have underscores ('_'), replacing with dashes ('-')
lung
An object of class Seurat
20730 features across 2977 samples within 1 assay
Active assay: RNA (20730 features, 0 variable features)
min.cells is the minimum number of cells in which a gene can be detected; min.features is the minimum number of genes that have to be expressed in a cell: genes/cells not satisfying these constraints are discarded a priori. In all the table will contain 2,977 cells and 20,730 “active” genes (“features”) after filtering.
The first step is the quality control on cells. We use the set of all genes with name starting with “mt” to identify mitochondrial genes (data are from mouse).
unlist(rownames(exp.mat), use.names = FALSE)[grep("^mt", unlist(rownames(exp.mat), use.names = FALSE))]
[1] "mt.Atp6" "mt.Co1" "mt.Co2" "mt.Co3" "mt.Cytb" "mt.Nd1" "mt.Nd2" "mt.Nd3" "mt.Nd4" "mt.Nd4l" "mt.Nd5" "mt.Nd6" "mt.Rnr1" "mt.Rnr2"
[15] "mt.Tc" "mt.Tn" "mt.Tp"
Percentage of counts originating from a set of genes:
lung[["percent.mt"]] <- PercentageFeatureSet(lung, pattern = "^mt")
head(lung@meta.data, 5)
Let’s visualize QC metrics, and use these to filter cells.
VlnPlot(lung, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
FeatureScatters:
plot1 <- FeatureScatter(lung, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(lung, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1 + plot2
We filter cells that have the number of expressed genes over 4,500 (doublets) or less than 200 (low quality cells/empty droplets), and we filter cells that have > 10% mitochondrial counts (low quality cells).
lung <- subset(lung, subset = nFeature_RNA > 200 & nFeature_RNA < 4500 & percent.mt < 10)
After removing unwanted cells from the dataset, the next step is to normalize the data.
lung <- NormalizeData(lung) #default parameters: normalization.method = "LogNormalize", scale.factor = 10000
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
The next step is to restrict the gene set to the “most variable” genes: those genes that exhibit the highest cell-to-cell variation in the dataset. By default the top 2000 variable genes are kept for all downstream analyses. Changing this number clearly will change the final results.
lung <- FindVariableFeatures(lung, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(lung), 10)
# plot variable features with and without labels
plot1 <- VariableFeaturePlot(lung)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
When using repel, set xnudge and ynudge to 0 for optimal results
plot1+plot2
Top 10 most variable genes:
top10
[1] "Lyz2" "Chil3" "Cd74" "Wfdc2" "Cbr2" "Sftpc" "Ccl6" "Sftpb" "Sftpa1" "Ear2"
All log-normalized counts are transformed so that they have mean 0 and variance 1 across all cells, regardless of the count values (high or low). This in practice correspond to a “binarization” of the data, or rather “ternarization”, where in general for each cell a gene will be tend “up” (>0), “average” (=0) or “down” (<0) with very close values.
all.genes <- rownames(lung)
lung <- ScaleData(lung, features = all.genes)
The list of S and G2M specific genes are already pre-loaded in Seurat, in “cc.genes”. We can segregate this list into markers of G2/M phase and markers of S phase.
s.genes <- stringr::str_to_title(cc.genes$s.genes) # stringr::str_to_title(): fro ex. from "PCNA" to "Pcna" since we are working on Mm genes
g2m.genes <- stringr::str_to_title(cc.genes$g2m.genes)
lung <- CellCycleScoring(lung, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)
The following features are not present in the object: Fen1, Mlf1ip, not searching for symbol synonyms
View cell cycle scores and phase assignments.
head(lung[[]])
Running a PCA on cell cycle genes reveals that cells do not separate entirely by phase -> no need to regress them out.
lung <- RunPCA(lung, features = c(s.genes, g2m.genes))
The following 2 features requested have not been scaled (running reduction without them): Fen1, Mlf1ipYou're computing too large a percentage of total singular values, use a standard svd instead.PC_ 1
Positive: Birc5, Top2a, Mki67, Cdk1, Cdca3, Tacc3, Kif23, Kif11, Ube2c, Uhrf1
Tpx2, Rrm2, Hmmr, Ccnb2, Aurkb, Cdca8, Nusap1, Dlgap5, Cks2, Cenpe
Cenpa, Ncapd2, Ckap2l, Rad51ap1, Hmgb2, Kif20b, Ckap2, Cks1b, Ect2, Kif2c
Negative: Msh2, Cbx5, Ccne2, Ttk, Brip1, Ubr7, Cdc6, Hjurp, Slbp, Pold3
Cdc25c, Casp8ap2, Exo1, Chaf1b, Cdca7, Mcm4, Rpa2, Ckap5, G2e3, Wdr76
Ctcf, Tipin, Ung, Tubb4b, Blm, Usp1, Nasp, Lbr, Cdc45, Psrc1
PC_ 2
Positive: Mcm6, Mcm5, Pcna, Hells, Hn1, Hmgb2, Rfc2, Ung, Dtl, Mcm4
Atad2, Gmnn, Nasp, Mcm2, Uhrf1, Pola1, Slbp, Cdca7, Smc4, Rpa2
Prim1, Casp8ap2, Lbr, Chaf1b, Usp1, Rrm1, Cdc45, Blm, Rrm2, Wdr76
Negative: Ccnb2, Cdca3, Kif2c, Ube2c, Ckap2l, Cenpa, Cdc20, Cenpf, Hmmr, Nek2
Tpx2, Cenpe, Fam64a, Ect2, Birc5, Kif23, Nusap1, Psrc1, Aurka, Cdca2
G2e3, Cbx5, Ckap2, Dlgap5, Aurkb, Bub1, Cks2, Cks1b, Gas2l3, Tacc3
PC_ 3
Positive: Cbx5, Uhrf1, Rrm1, Tyms, Cdk1, Hells, Rrm2, Hjurp, Dtl, E2f8
Atad2, Cdc6, Tipin, Cdca7, Top2a, Ncapd2, Cks1b, Cdc45, Clspn, Pola1
Kif11, Mcm5, Mcm2, Rad51, Gins2, Blm, Cdca8, Dscc1, Tacc3, Mki67
Negative: Hmgb2, Hn1, Cks2, Lbr, Rfc2, Smc4, Anp32e, Mcm6, Cenpa, Rpa2
Rangap1, Slbp, Cdc20, Ubr7, Msh2, Tubb4b, Kif2c, Nek2, Fam64a, Ctcf
G2e3, Prim1, Cenpf, Ect2, Cdca3, Kif20b, Ccnb2, Pold3, Casp8ap2, Pcna
PC_ 4
Positive: Tubb4b, Ckap5, Cbx5, Tmpo, Ctcf, Usp1, Casp8ap2, Pold3, Cks1b, Hjurp
G2e3, Pola1, Cdc6, Aurka, Gmnn, Ubr7, Anp32e, Nasp, Pcna, Tipin
Kif20b, Mcm5, Cdc20, Atad2, Ccnb2, Hn1, Psrc1, Cdca8, Nek2, Rrm1
Negative: Ncapd2, Mcm4, Slbp, Uhrf1, E2f8, Hmgb2, Rrm2, Cdca7, Tpx2, Smc4
Gins2, Cdk1, Mki67, Aurkb, Dtl, Cks2, Cdca3, Clspn, Birc5, Kif11
Top2a, Brip1, Rad51, Wdr76, Gas2l3, Tacc3, Nuf2, Ung, Hmmr, Cdc45
PC_ 5
Positive: Pold3, Mcm2, Wdr76, Hjurp, Pola1, Gmnn, Blm, Rangap1, Tyms, Smc4
Rpa2, Anln, Casp8ap2, Lbr, Dlgap5, Rad51ap1, Mcm5, Hells, Hn1, Dscc1
Rrm2, Mki67, Cenpa, Rfc2, Cdca8, Ccnb2, Brip1, Ckap2, Top2a, Tubb4b
Negative: Mcm4, Prim1, Ung, Cbx5, Cdc45, Cdca7, Cdc6, Ctcf, Nasp, G2e3
Slbp, Usp1, Mcm6, Tipin, Gins2, Ckap5, Dtl, Rrm1, Aurka, Cks2
Msh2, Gas2l3, Hmgb2, Tmpo, Psrc1, Cks1b, Hmmr, Clspn, Tacc3, Uhrf1
DimPlot(lung)
The next step is the dimensionality reduction, on scaled counts and on the 2000 most variable genes extracted before, with PCA.
lung <- RunPCA(lung, features = VariableFeatures(object = lung))
PC_ 1
Positive: Igfbp7, Sparc, Mgp, Col3a1, Col1a1, Col1a2, Timp3, Apoe, Nbl1, Inmt
Mfap5, Nupr1, Clec3b, Tmem176a, Id3, Fbln1, Timp2, Cebpd, Mfap4, Cfh
Igfbp6, Lrp4, Rarres2, Mylk, Eln, Fstl1, Jun, Cyr61, Tcf21, Gas6
Negative: Fcer1g, Tyrobp, Cd52, Rac2, Coro1a, Laptm5, Arhgdib, Nkg7, Ptpn18, Hcst
Selplg, Cd53, Ptprc, Gzma, Ptprcap, Klrd1, Ccl5, Ms4a4b, Klre1, Klrb1c
Fxyd5, Klrk1, Ncr1, Itgb2, RP23.52N2.1, Sh3bgrl3, Prf1, Arhgap45, Gimap4, Ctsw
PC_ 2
Positive: Dynlrb2, RP23.100C7.3, Fam183b, RP23.96I9.2, Sec14l3, Tmem212, RP23.167L1.8, Ccdc153, AC132584.3, Arhgdig
RP24.73K7.1, RP23.430H1.2, Lrrc10b, RP23.281F13.4, RP24.160I20.1, Cfap126, Erich2, Meig1, Cyp2f2, Krt8
Rsph1, Tctex1d4, Cldn3, Cyp2s1, Cbr2, Cxcl17, Mlf1, Smim22, AC157761.1, Efcab10
Negative: Mgp, Inmt, Igfbp7, Sparc, Apoe, Col1a1, Col3a1, Col1a2, Mfap4, Mfap5
Cxcl14, Clec3b, Lrp4, Cdo1, Timp2, Gas7, G0s2, Fbln1, Tcf21, Cebpd
Mylk, Npnt, Nbl1, Cfh, AC161169.2, Tubb5, Cyr61, Eln, Timp3, Fstl1
PC_ 3
Positive: Ear2, Mpeg1, Chil3, Cybb, Ctss, Mcemp1, Ccl6, Clec7a, Alox5ap, Ltc4s
Ear1, Lyz2, Mrc1, Atp6v0d2, Il1rn, Plet1, Cd68, Cd300d, Spi1, Krt79
Bcl2a1a, Clec4n, Fabp1, Olr1, Csf2rb, Nceh1, Sirpb1c, Sirpa, Klra2, Wfdc21
Negative: Nkg7, Gzma, Klre1, Klrb1c, Ncr1, Klrk1, Ms4a4b, Klrd1, Prf1, Ccl5
Ptprcap, Il2rb, Gimap4, Gzmb, Ctsw, RP23.52N2.1, Klra13.ps, Klra9, Cma1, Sept1
Lck, Klra8, Klrc2, H2.Q7, Serpinb9, Cst7, Klra11.ps, Klri2, Klra4, Il18r1
PC_ 4
Positive: S100a6, Igfbp6, Clec3b, Cygb, Cfh, Dcn, Mfap5, Igfbp7, Ly6a, Adamts2
C3, Mgp, Cd34, Fbln1, Pi16, Gas1, Timp3, Serpinf1, Col14a1, Timp2
Col1a2, Cxcl12, Aebp1, Gpc3, Gas7, Igfbp4, Scara5, Htra3, Mmp2, Serpina3n
Negative: Cox4i2, Higd1b, Postn, RP23.250M11.8, Fam162b, Gucy1b3, Vtn, Lipg, Gucy1a3, Itga1
Notch3, Hbegf, Ndufa4l2, Vsnl1, Tmem178, Emid1, Gm26580, Art3, Rgcc, Trpc6
Adcy8, Pde5a, Agtr1a, Pdzd2, Lmcd1, Gja4, Gap43, Ltbp2, Tusc5, Mcam
PC_ 5
Positive: Npnt, Cdo1, Inmt, Cxcl14, G0s2, Gja5, Wisp2, Slc7a10, Ces1d, AC167245.5
Mfap4, Lrat, Tcf21, Lrp4, Tmem176a, Colq, Mylk, AC161169.2, Apoe, Hspb1
Cp, Prdx6, Gadd45b, Gsta3, Vcam1, Trf, RP23.189B9.5, Plin2, Cpa3, RP23.375N21.2
Negative: Dcn, Meg3, Ahnak2, Ly6a, Pi16, Col14a1, Aldh1a2, Crip1, Prss23, Scara5
Il33, Entpd2, Igfbp4, Serpinf1, Rspo1, Aqp1, Ebf1, Slco2a1, Cd248, Gpm6a
Ly6c1, Upk3b, RP23.162H6.1, Cpe, Anxa3, Ccl11, Nkain4, Msln, Aebp1, Lrrn4
print(lung[["pca"]], dims = 1:5, nfeatures = 5)
PC_ 1
Positive: Igfbp7, Sparc, Mgp, Col3a1, Col1a1
Negative: Fcer1g, Tyrobp, Cd52, Rac2, Coro1a
PC_ 2
Positive: Dynlrb2, RP23.100C7.3, Fam183b, RP23.96I9.2, Sec14l3
Negative: Mgp, Inmt, Igfbp7, Sparc, Apoe
PC_ 3
Positive: Ear2, Mpeg1, Chil3, Cybb, Ctss
Negative: Nkg7, Gzma, Klre1, Klrb1c, Ncr1
PC_ 4
Positive: S100a6, Igfbp6, Clec3b, Cygb, Cfh
Negative: Cox4i2, Higd1b, Postn, RP23.250M11.8, Fam162b
PC_ 5
Positive: Npnt, Cdo1, Inmt, Cxcl14, G0s2
Negative: Dcn, Meg3, Ahnak2, Ly6a, Pi16
VizDimLoadings(lung, dims = 1:2, reduction = "pca")
We visualize the results on the first two PC.
DimPlot(lung, reduction = "pca")
We can plot the data projected on any of the PCA dimensions.
DimPlot(lung, reduction = "pca", dims = c(3,4))
Heatmap of the most variable genes on the most variable cells in the first PC:
DimHeatmap(lung, dims = 1, cells = 500, balanced = TRUE)
There are several genes with a clear difference of expression across the most variable cells, splitting them into two main groups. The same can be done for the other PCs.
DimHeatmap(lung, dims = 1:9, cells = 500, balanced = TRUE)
Next step is to determine the “dimensionality” of the dataset. “Significant” PCs in JackStrawPlot() will show a strong enrichment of features with low p-values (solid curve above the dashed line).
lung <- JackStraw(lung, num.replicate = 100)
lung <- ScoreJackStraw(lung, dims = 1:20)
JackStrawPlot(lung, dims = 1:20)
An alternative heuristic method generates an “Elbow plot”. Here we can observe an “elbow” around PC20, suggesting that the majority of true signal is captured in the first 20 PCs.
ElbowPlot(lung)
Setting resolution between 0.4-1.2 typically returns good results for single-cell datasets of around 3K cells. Optimal resolution often increases for larger datasets. The clusters can be found using the Idents function.
lung <- FindNeighbors(lung, dims = 1:20)
Computing nearest neighbor graph
Computing SNN
lung <- FindClusters(lung, resolution = 0.5)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 2935
Number of edges: 101819
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9060
Number of communities: 14
Elapsed time: 0 seconds
head(Idents(lung), 5)
AAACCTGAGACAAAGG AAACCTGAGACTAAGT AAACCTGAGCGTCTAT AAACCTGAGCTGGAAC AAACCTGCATCGGTTA
0 0 3 0 4
Levels: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Now the PCA plot is labeled with clusters.
DimPlot(lung, reduction = "pca")
UMAP:
#reticulate::py_install(packages ='umap-learn')
lung <- RunUMAP(lung, dims = 1:20)
The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session15:15:57 UMAP embedding parameters a = 0.9922 b = 1.112
15:15:57 Read 2935 rows and found 20 numeric columns
15:15:57 Using Annoy for neighbor search, n_neighbors = 30
15:15:57 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:15:57 Writing NN index file to temp file /var/folders/4k/0lqzcm0545d30b5nyr_dptyh0000gn/T//RtmpO3l2gf/file3a615425e71
15:15:57 Searching Annoy index using 1 thread, search_k = 3000
15:15:58 Annoy recall = 100%
15:16:00 Commencing smooth kNN distance calibration using 1 thread
15:16:03 Initializing from normalized Laplacian + noise
15:16:03 Commencing optimization for 500 epochs, with 118972 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:16:08 Optimization finished
DimPlot(lung, reduction = "umap")
We visualize here a table containing (identified by number) the number of cells belonging to each cluster, useful to be compared to the “cluster table” on PanglaoDB (https://panglaodb.se/list_clusters_and_cell_types.html?sra=SRA653146&srs=SRS3044261).
table(Idents(lung))
0 1 2 3 4 5 6 7 8 9 10 11 12 13
676 527 414 350 343 270 105 77 59 29 27 20 19 19
Let’s save the object.
saveRDS(lung, file = "lung.rds")
The final step is to give an “identity” to the clusters. That is, find which are the “marker genes” (expressed exclusively, or at least over-expressed) in each cluster with respect to the others. Then, trying to figure out, according to the marker genes of each cluster, what could be the corresponding cell type.
We can find the “differentially expressed genes” for a cluster against the others, with the additional condition that the gene has to be expressed in at least the 25% of the cells in the cluster:
cluster1.markers <- FindMarkers(lung, ident.1 = 1, min.pct = 0.25)
head(cluster1.markers, n = 5)
Or choose to compare explicitly two or more clusters. Here cluster 5 is compared to clusters 0 and 3.
cluster5.markers <- FindMarkers(lung, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)
head(cluster5.markers, n = 5)
Or, find the markers of all clusters, with respect to the others (this will take longer). The results are stored in a table with two genes for each cluster. Here the criteria are a) expressed in at least 25% of the cells and b) with a log2 fold change of at least 0.25.
lung.markers <- FindAllMarkers(lung, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
lung.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_log2FC)
There are several tools for visualizing marker expression. VlnPlot() (shows expression probability distributions across clusters), and FeaturePlot() are most commonly used visualizations. There are also RidgePlot() and DotPlot() as additional methods to view our dataset.
# Violin plot of the expression of the best "marker gene" for each cluster:
# VlnPlot(lung, features = features = c("Slc7a10", "Gzma", "Rbp1", "Pi16", "Vcam1", "Mustn1", "Cox4i2", "Ear2", "Ms4a6c", "Igkc", "Sftpa1", "Ak7", "Aqp1", "Rac2")) splitted into (for better visualization):
VlnPlot(lung, features = c("Slc7a10", "Gzma", "Rbp1"))
VlnPlot(lung, features = c("Pi16", "Vcam1", "Mustn1"))
VlnPlot(lung, features = c("Cox4i2", "Ear2", "Ms4a6c"))
VlnPlot(lung, features = c("Igkc", "Sftpa1", "Ak7"))
VlnPlot(lung, features = "Aqp1")
VlnPlot(lung, features = "Rac2") # Unknown
#FeaturePlot(lung, features = c("Slc7a10", "Gata3", "Gzma", "Ccl5", "Ifng", "Rbp1", "Pi16", "Vcam1", "Mustn1", "Acta2", "Cox4i2", "Higd1b", "Chil3", "Ear1", "Ear2", "Ms4a6c", "Ccr7", "Igkc", "Sftpa1", "Sftpc", "Sfta2", "Egfl6", "Lamp3", "Ak7", "Aqp1", "Bcam", "Rac2", "Atf3")) splitted into (for better visualization):
FeaturePlot(lung, features = c("Slc7a10", "Gata3", "Gzma", "Ccl5"))
FeaturePlot(lung, features = c("Ifng", "Rbp1", "Pi16", "Vcam1"))
FeaturePlot(lung, features = c("Mustn1", "Acta2", "Cox4i2", "Higd1b"))
FeaturePlot(lung, features = c("Chil3", "Ear1", "Ear2", "Ms4a6c"))
FeaturePlot(lung, features = c("Ccr7", "Igkc", "Sftpa1", "Sftpc"))
FeaturePlot(lung, features = c("Sfta2", "Egfl6", "Lamp3", "Ak7"))
FeaturePlot(lung, features = c("Aqp1", "Bcam", "Rac2", "Atf3"))
RidgePlot(lung, features = c("Gzma", "Ear2"), ncol = 2) # Ridge plots from ggridges. Visualize single cell expression distributions in each cluster
DotPlot(lung, features = c("Slc7a10", "Gata3", "Gzma", "Ccl5", "Ifng", "Rbp1", "Pi16", "Vcam1", "Mustn1", "Acta2", "Cox4i2", "Higd1b", "Chil3", "Ear1", "Ear2", "Ms4a6c", "Ccr7", "Igkc", "Sftpa1", "Sftpc", "Sfta2", "Egfl6", "Lamp3", "Ak7", "Aqp1", "Bcam", "Rac2", "Atf3", "Irf8")) + RotatedAxis() # the size of the dot corresponds to the percentage of cells expressing the feature in each cluster. The color represents the average expression level
DoHeatmap() generates an expression heatmap for given cells and features. In this case, we are plotting the top 10 markers for each cluster.
lung.markers %>%
group_by(cluster) %>%
top_n(n = 10, wt = avg_log2FC) -> top10
DoHeatmap(lung, features = top10$gene) + NoLegend()
new.cluster.ids <- c("Fibroblasts", "NK", "Fibroblasts", "Fibroblasts", "Fibroblasts", "SMCs", "Pericytes", "Alveolar m.", "Macrophages", "B", "PneumocytesII", "Ependymal", "Endothelial", "Unknown")
names(new.cluster.ids) <- levels(lung)
lung <- RenameIdents(lung, new.cluster.ids)
DimPlot(lung, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
saveRDS(lung, file = "lung_final.rds")
Single cell heatmap showing the expression of best “marker gene” for each cluster overlayed on the UMAP projection of the cells.
DoHeatmap(subset(lung, downsample = 100), features = c("Slc7a10", "Gzma", "Rbp1", "Pi16", "Vcam1", "Mustn1", "Cox4i2", "Ear2", "Ms4a6c", "Igkc", "Sftpa1", "Ak7", "Aqp1", "Rac2"), size = 3)
This analysis was conducted on:
sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_3.3.5 patchwork_1.1.1 dplyr_1.0.7 SeuratObject_4.0.2 Seurat_4.0.3
[6] SummarizedExperiment_1.20.0 Biobase_2.50.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 IRanges_2.24.1
[11] S4Vectors_0.28.1 BiocGenerics_0.36.1 MatrixGenerics_1.2.1 matrixStats_0.59.0 edgeR_3.32.1
[16] limma_3.46.0
loaded via a namespace (and not attached):
[1] Rtsne_0.15 colorspace_2.0-2 deldir_0.2-10 ellipsis_0.3.2 ggridges_0.5.3 XVector_0.30.0
[7] rstudioapi_0.13 spatstat.data_2.1-0 leiden_0.3.8 listenv_0.8.0 ggrepel_0.9.1 fansi_0.5.0
[13] codetools_0.2-18 splines_4.0.4 knitr_1.33 polyclip_1.10-0 jsonlite_1.7.2 ica_1.0-2
[19] cluster_2.1.2 png_0.1-7 uwot_0.1.10 shiny_1.6.0 sctransform_0.3.2 spatstat.sparse_2.0-0
[25] compiler_4.0.4 httr_1.4.2 assertthat_0.2.1 Matrix_1.3-4 fastmap_1.1.0 lazyeval_0.2.2
[31] cli_3.0.0 later_1.2.0 htmltools_0.5.1.1 tools_4.0.4 igraph_1.2.6 gtable_0.3.0
[37] glue_1.4.2 GenomeInfoDbData_1.2.4 RANN_2.6.1 reshape2_1.4.4 Rcpp_1.0.6 scattermore_0.7
[43] vctrs_0.3.8 nlme_3.1-152 lmtest_0.9-38 xfun_0.24 stringr_1.4.0 globals_0.14.0
[49] mime_0.11 miniUI_0.1.1.1 lifecycle_1.0.0 irlba_2.3.3 goftest_1.2-2 future_1.21.0
[55] zlibbioc_1.36.0 MASS_7.3-54 zoo_1.8-9 scales_1.1.1 spatstat.core_2.2-0 promises_1.2.0.1
[61] spatstat.utils_2.2-0 RColorBrewer_1.1-2 yaml_2.2.1 reticulate_1.20 pbapply_1.4-3 gridExtra_2.3
[67] rpart_4.1-15 stringi_1.6.2 rlang_0.4.11 pkgconfig_2.0.3 bitops_1.0-7 evaluate_0.14
[73] lattice_0.20-44 ROCR_1.0-11 purrr_0.3.4 tensor_1.5 htmlwidgets_1.5.3 cowplot_1.1.1
[79] tidyselect_1.1.1 parallelly_1.26.1 RcppAnnoy_0.0.18 plyr_1.8.6 magrittr_2.0.1 R6_2.5.0
[85] generics_0.1.0 DelayedArray_0.16.3 DBI_1.1.1 withr_2.4.2 pillar_1.6.1 mgcv_1.8-36
[91] fitdistrplus_1.1-5 survival_3.2-11 abind_1.4-5 RCurl_1.98-1.3 tibble_3.1.2 future.apply_1.7.0
[97] crayon_1.4.1 KernSmooth_2.23-20 utf8_1.2.1 spatstat.geom_2.2-0 plotly_4.9.4.1 rmarkdown_2.9
[103] locfit_1.5-9.4 grid_4.0.4 data.table_1.14.0 digest_0.6.27 xtable_1.8-4 tidyr_1.1.3
[109] httpuv_1.6.1 munsell_0.5.0 viridisLite_0.4.0
Hao, Hao, et al., Cell 2021; Seurat V4; https://satijalab.org/seurat/
Giulio Pavesi; Seurat example on 10x Data (2021 update); http://159.149.160.56/Transcriptomics/seurat.html